System and method for decreasing end-to-end-delay during video conferencing session

ABSTRACT

A method for decreasing end-to-end delay in a video conferencing context is disclosed. At video conferencing system startup, a processor is initialized to receive either a top field or a bottom field of video frame data. If the first line of a new field arriving after initialization does not match a field state that the processor is initialized to, the present invention senses the state mismatch and adjusts a display buffer by one display line, and the field is stored in the display buffer. The display buffer is adjusted in order to preserve a vertical spatial relationship between the top and bottom fields.

RELATED APPLICATIONS

[0001] This application claims the benefit under 35 U.S.C. § 119(e) ofU.S. Provisional Patent Application No. 60/384,606, filed May 31, 2002,which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to video conferencesystems, and more particularly to decreasing end-to-end delay duringvideo conferencing sessions.

[0004] 2. Description of the Related Art

[0005] The well-known National Television Standards Committee (NTSC) andPhase Alternating Line (PAL) television standards are employed by videocameras and monitors to capture and display video information forconsumer applications. Both NTSC and PAL cameras and monitors captureand display video information in an interlaced format. Interlacingrefers to a method of capturing two fields of video information perframe. One half of a vertical resolution of a frame (i.e., every otherhorizontal line) is captured in a first or “top” field. A remaining halfof the vertical resolution of the frame is captured in a second or“bottom” field. Each frame of a video picture produced by the NTSCcamera or displayed by the NTSC monitor is displayed in a 480-lineformat with each line having 720 pixels, while the PAL format isdisplayed in 576 lines. The NTSC video is transmitted at 60 frames persecond and, the PAL video is transmitted at 50 frames per second.Adaptations of these formats have been adopted for emerginghigh-definition television as well.

[0006] Typically, the NTSC or PAL cameras and monitors are used inconjunction with video conferencing systems that implement theInternational Telecommunications Union (ITU) Telecommunications (ITU-T)H.263 standard (incorporated herein by reference in its entirety,including all annexes, appendices, and subparts thereof), since suchdevices are much less expensive than equipment that captures videoinformation using progressive (non-interlaced) scan technology. Untilrecently, however, the H.263 standard did not directly supportinterlaced video transmission, but supported Common Intermediate Format(CIF), which is a non-interlaced frame consisting of 288 lines of 352pixels each. Transmission rate for CIF video can be as high as 30 framesper second. Thus, video conference systems had to convert from NTSC (orPAL) into CIF before coding each input video frame. Such a conversiondiscards some spatial and temporal information, and thus degrades thepicture quality. In this context the “spatial information” is the pixelsin both vertical and horizontal directions that are not included in theCIF frame. Likewise, the discarded “temporal information” represents thefact that a 50 or 60 frame per second (fps) transmission of the NTSC orPAL standard is down-sampled to 30 fps in the CIF format.

[0007] In recent years, cost of hardware and transmission bandwidthrequired for coding and transmitting interlaced video pictures hasdecreased. It is now considered economically practical for a videoconferencing system to code interlaced pictures with a full spatialdimension of NTSC or PAL input sources. The ITU has addressed thischange in technology by adding Annex W to the H.263 standard.

[0008] Annex W describes how interlaced video signals can be encoded anddecoded when transmitted in a single stream (or channel) of videoinformation. The Annex W video encoding (or simply “coding”) schemeutilizes a reference frame from one field to predict a picture ofanother field. However, a top field in an interlaced video transmissionscheme is a poor predictor of a bottom field and vice versa. Thus, usingthe top field to predict the bottom field can lead to poor picturequality during times of low motion.

[0009] This particular form of picture quality degradation is due to thefact that the camera creates a complete picture frame by first scanningfor top field information and then scanning for bottom fieldinformation. Each field is thus separated spatially (by one line) andtemporally (by the refresh period between the end of the top field andthe end of the bottom field). This temporal and spatial separation canresult in display jitter, which is more noticeable during times of lowmotion. With this problem in mind, Section W.6.3.11 of Annex W suggeststhat Annexes N or U of H.263 can be used to predict from more than oneprevious field. For example, two or three previous fields can be used toform a prediction of the next field. In particular, the field (orfields) to be used for prediction can be chosen (according to Annexes Nor U) such that each top field is always predicted from the previous topfield (or fields) and each bottom field is always predicted from theprevious bottom field (or fields). In this way, the top field can becoded and transmitted in a stream completely separate from the streamcontaining the bottom field. Using the video information from the samefield for prediction thus mitigates the picture quality problemdescribed above.

[0010] This field prediction scheme is also more resilient to errors. Ifone stream of video information is temporarily dropped, the other streamcan continue. Since one field remains, there is always some videoinformation to decode and display, albeit at a slower update rate.

[0011] Further, more than one processor may be used to more efficientlyencode a video stream in a multiple-processor architecture. For example,one processor can code the stream of top fields, and a second processorcan code the stream of bottom fields, where each processor is programmedto capture and encode either the top or bottom field of videoinformation. Each processor may receive both streams of top or bottomfields and decode one. Conversely, the video conferencing system may beconfigured such that each processor only receives one of the fieldstreams.

[0012] Several shortcomings exist in the above-described systems.Firstly, dropped fields, caused by large amounts of motion or bytransmission errors occurring in any one of the video signaltransmission streams, can affect the quality of the displayed picturefor an extended period of time. In such cases, the picture qualityremains poor until the coding process recovers. For instance, if a fieldof information is lost during transmission for any reason, and a decodersignals an encoder to encode an “Intra” field (the use of Intra fieldsdescribed within the H.263 standard), the quality of that half of thepicture (i.e., the lost field) will suffer for a period of time that ittakes the encoder to recover from the error and/or encode the Intraframe.

[0013] Another shortcoming of prior art systems is that the field thatthe encoder begins encoding with (at start up) is indeterminate. Thereceiving video conference system does not know a priori whether thefirst frame to be received will begin with a top field or a bottomfield. This is so because, at the transmitting video conferenceterminal, the video camera starts generating and sending fields of videoinformation before the encoder is ready to receive the information.After the encoder is itself initialized, the encoder begins processingat the beginning of the next field it sees.

[0014] This situation can cause additional and unacceptable transmissiondelay. If the received video stream begins with the same field that theencoder was initialized to expect, there are no problems and no addeddelay in subsequent encoding. If, however, the encoder receives theopposite field than the one that is expected, the encoder will wait(i.e. delay) for as much as an entire field capture time (e.g. 16.7milliseconds) in order to receive and store the expected field. Thisimage delay will prevail for the entire video conferencing session. Sucha systematic delay can lead to unacceptable meeting dynamics andmisunderstood conversations.

[0015] In a dual processor implementation, each processor is programmedto capture and encode either the top or the bottom field of videoinformation (i.e. each processor receives both fields of video, however,both fields are not captured and encoded). Generally, at system starttime, the encoder randomly sends either the top or bottom field of videoinformation first. Specifically, at the time that the video conferencingsystem is started, either the top or the bottom field of video can beavailable to either of the two processors. This is because the videocamera starts generating and sending fields of video information priorto the processors being ready to receive video information, and theprocessors will capture the first field that is available afterinitialization.

[0016] The first field that the decoder receives can be indeterminatefor other reasons as well. For instance, bit errors contained in a fieldcan also cause the field to be dropped at the decoder or lost in thenetwork. At startup, an interrupt is generated by the decoder which hasan effect of preparing the decoder to receive either the top or bottomfield of video (actually the routine that services this interruptdetermines which field the pointer will be initialized to). In somesystems, one interrupt is generated every 16.7 msec (NTSB) or 20 msec(PAL), which is a period of time it takes to display one field ofinformation. As a result of this interrupt, a display buffer pointer isset to a particular memory location. This location could, for instance,correspond to a first line (i.e., line 0) of the top field of videoinformation. During normal operation, the display buffer pointer ischanged by the processor whenever the processor services the interrupt.This interrupt is generated during a vertical blanking period (i.e., theperiod during which the monitor scanning moves back to the top of thedisplay screen). The receipt and servicing of this interrupt results inthe pointer being moved from a starting position (i.e., either top orbottom field location) to a second position (i.e., either bottom or topfield location, respectively). Disadvantageously, if the first fieldthat the encoder captures, encodes, and transmits is not the field thatthe decoder buffer pointer was initialized to, then the decoder mustwait one full encoder capture period (e.g., 16.7 msec) for the nextfield to arrive. This wait adds 16.7 msec of end-to-end video delay tothe system. When the total end-to-end video delay ranges from 150 to 200msec due to bandwidth availability and network delay, removing 16.7 msecis significant.

[0017] Since the first field that the decoder receives at the start of avideo conference session is not determinate, the decoder may have towait one field capture time (e.g., 16.7 msec) to store the next field inthe display buffer, therefore delaying display of the image. This videoimage delay prevails for the entire video conferencing session.

[0018] One of the main problems with end-to-end video delay is that thedelay affects video meeting dynamics. One example of a meeting dynamicsproblem is if a local person makes a statement and is watching a remotemeeting participant waiting for a response and the response is delayedto a point that the local person is not sure whether or not the remoteparticipant understood the statement. Another example is if the localperson is listening to the participant and is also waiting for anopportunity to break in to ask a question. If, at the same time, asecond remote person is also waiting to break in, in all probability,the second remote person will do so before the local person is awarethat the first remote participant has stopped talking. So, in effect,people interrupt one another during a meeting in an “uncontrolled”manner. As this is the case, it is very desirable to have the end-to-enddelay time be as short as possible, therefore giving the meeting as“natural” a feeling as possible.

[0019] Therefore there is a need for a method that avoids introductionof additional delay in a video conferencing session.

SUMMARY OF THE INVENTION

[0020] The present invention provides in various embodiments a methodfor decreasing end-to-end delay in a video conferencing context.According to one embodiment of the present invention a processor isinitialized to receive an initial field of video frame data having afirst state. The processor receives an initial field of video frame datahaving either a first state or a second state. If the state of theinitial field of video frame data is not the same as a state that theprocessor is initialized to, then a display buffer is adjusted by onedisplay line, and the initial field of video frame data having a secondstate is stored in the display buffer.

[0021] According to another embodiment of the present invention, amethod is provided for decreasing end-to-end delay in a videoconferencing context, where at least one buffer pointer is initializedto either a first state or a second state to form a first initializedbuffer pointer. The first state is associated with a top field of thevideo frame data, and the second state is associated with a bottom fieldof the video frame data. An initial field of video frame data isreceived having either the first state or the second state. If the stateof the initial field of video frame data is not the same as the state ofthe first initialized buffer pointer, the state of the first initializedbuffer pointer is toggled, and the first received field is stored into abuffer using the first initialized buffer pointer.

[0022] A further understanding of the nature and advantages of theinventions herein may be realized by reference to the remaining portionsof the specification and the attached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023] The foregoing and other advantages of the invention will beappreciated more fully from the following further description thereofand with reference to the accompanying drawings, wherein:

[0024]FIG. 1 shows an exemplary video conferencing system that may beused with the present invention.

[0025]FIG. 2 shows an exemplary high-level schematic view of elements ofa video conference terminal.

[0026]FIG. 3 shows a display format for PAL standards.

[0027]FIG. 4 shows a schematic representation of the organization of avideo frame buffer where the display buffer pointer is initialized tothe top field and the bottom field is received first.

[0028]FIG. 5 shows a schematic representation of the organization of avideo frame buffer where the display buffer pointer is initialized tothe bottom field and the top field is received first.

[0029] The use of the same reference symbols in different drawingsindicates similar or identical items.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0030] Introduction

[0031] To provide an overall understanding of the present invention,certain illustrative embodiments will now be described in the context ofan ITU Standard H.263 video conferencing system.

[0032] It will be understood by those of ordinary skill in the art thatthe methods and systems described herein may be suitably adapted toother video coding techniques, such as Moving Picture Experts Group(MPEG) standards, Audio Visual Interleave (AVI), or Multiple ImageNetwork Graphics (MNG). All such adaptations and modifications thatwould be clear to one of ordinary skill in the art are intended to fallwithin the scope of the invention described herein.

[0033] Furthermore, although the term “coding” is used herein, those ofordinary skill in the art will appreciate that the reciprocal decodingfunction is also implicated in the use of present invention.Accordingly, all references to coding techniques are to be understood toinclude decoding techniques unless specifically identified otherwise.

[0034] As used herein, terms such as “image”, “image data”, “picture”,“picture data”, “video”, “video data”, and “video stream” are intendedto refer generally to any form of video data, unless specifically statedotherwise. This includes reference images (which may, for example, berepresented or described in terms of luminance and chrominance data),differential data, motion vectors, sequential identifiers, and any othercoding and control information, whether relating to blocks,macro-blocks, frames, or any other partial or complete imagerepresentation, however encoded.

[0035] Referring to FIG. 1 an exemplary video conferencing system thatmay be used with the present invention is shown. In a video conferencingnetwork 100, a rack 110 may include a multi-point conference unit(“MCU”) 120, a gateway 130, and hardware/software for other services150. The gateway 130 may provide one or more connections to a PublicSwitched Telephone Network 160, for example, through high speedconnections such as Integrated Services Digital Network (“ISDN”) lines,T1 lines, or Digital Subscriber Lines (“DSL”). Multiple PSTN videoconferencing terminals 170 may also be connected in a communicatingrelationship with the PSTN 160, and may be accessible using knowntelecommunications dialing and signaling services.

[0036] The MCU 120 may also be connected in a communicating relationshipwith a network 180. Multiple Internet Protocol (“IP”) video conferencingterminals 190 may also be connected in a communicating relationship withthe network 180, and may be accessible using known data networkingtechniques, such as IP addressing.

[0037] It will be appreciated that, although the following descriptionrefers to the network 180 (e.g., an IP network such as the Internet) andthe PSTN 160, any network for connecting terminals may be usefullyemployed according to the principles of the present invention. Thenetwork 180, for example, may be any packet-switched network, acircuit-switched network (such as an Asynchronous Transfer Mode (“ATM”)network), or any other network for carrying data including thewell-known Internet. The network 180 may also be the Internet, anextranet, a local area network, or other networks of networks known inthe art. Further, the PSTN 160 may likewise be any circuit-switchednetwork, or any other network for carrying circuit-switched signals orother data. It is additionally appreciated that the PSTN 160 and/or thenetwork 180 may likewise include wireless portions, or may be completelywireless networks. Finally, the principles of the present invention maybe usefully employed in any multimedia system.

[0038] It will also be appreciated that the components of the rack 110,such as the MCU 120, the gateway 130, and the other services 150, may berealized as separate physical machines, as separate logical machines ona single physical device, as separate processes on a single logicalmachine, or some combination of these. Further, a single, physical rackdevice is not required. Additionally, each component of the rack 110,such as the gateway 130, may comprise a number of separate physicalmachines grouped as a single logical machine, as for example, wheretraffic through the gateway 130 exceeds data handling and processingpower of a single machine. A distributed video conferencing network mayinclude a number of racks 110, as indicated by ellipsis 192.

[0039] Each PSTN video conferencing terminal 170 may use an establishedtelecommunications video conferencing standard such as H.320. Further,each IP video conferencing terminal 190 may use an established datanetworking video standard such as H.323. H.320 is an ITU-T standard forsending voice and audio over the PSTN 160, and provides common formatsfor compatible audio/video inputs and outputs, and protocols that allowa multimedia terminal to utilize the communications links andsynchronize audio and video signals. The T.120 standard may also be usedto enable data sharing and collaboration. The ITU-T, H.320, and T.120standards are incorporated herein by reference in their entireties.

[0040] The gateway 130 may communicate with the PSTN 160, and maytranslate data and other media between a form that is compatible withthe PSTN 160 and a form that is compatible with the network 180,including any protocol and media translations required to transportmedia between the networks.

[0041] Referring now to FIG. 2, some major components of a videoconferencing terminal 200 suitable for use with either a PSTN or an IPnetwork are shown in a high-level schematic form. The terminal 200 mayinclude input devices such as a video camera 210, a microphone 215, akeyboard (not shown), and a pointing device (not shown). The terminal200 may also include output devices such as a speaker 220 and a displaysystem 225. Those of ordinary skill in the conferencing arts will alsorecognize that additional input and output devices, including but notlimited to overhead projectors, projection video systems, andwhiteboards, may also be used as components within the videoconferencing terminal 200.

[0042] The video conferencing terminal 200 also may contain analog todigital converters (“A/D”) 230 for converting analog input signals fromone or more sources into a digital form for encoding. An audiocoder/decoder (“codec”) 240, which may include A/D converter 230functionality, encodes audio signals for transmission via a transmitter260. Similarly, a video codec 250 performs analogous functions for videosignals.

[0043] In an exemplary embodiment, the video codec 250 comprisesseparate encoders 252 and 254 for top and bottom video fields that makeup each video frame, respectively. The video codec 250 may also includea field splitter 257, combiner/multiplexer 259 functions, and A/Dconverter function 230, depending on the type and output signalcharacteristics of the camera 210. Typically, functional blocks 230,257, 252, 254, and 259 are present in all video encoding systems, andthe present description is intended only to convey a functional overviewof video signal processing rather than a working schematic.

[0044] While those of ordinary skill in the art will readily recognizethe function of a codec, as used herein the term “codec” is not limitedto a device or subsystem that performs coding and decodingsimultaneously. Instead, the term “codec” is here in used to refer toaggregated functions of coding (or encoding) and decoding, which may beperformed exclusively or in combination in one or more physical devices.Thus, in certain instances the term “encoder” (or its equivalent,“coder”) is used to connote the encoding function only. In otherinstances, the term “decoder” is used to connote the decoding function.In still other contexts, the term “codec” may be used as ageneralization of either or both functions.

[0045] The video codec 250 and the audio codec 240 (and theircounterpart codecs 251 and 241 in the receiving path of the terminal200, respectively) provide standards-based conferencing according to theH.320 and T.120 standards for PSTN terminals or H.323 standard for IPterminals. These standards may be implemented entirely in software on acomputer (not shown), on dedicated hardware, or in some combination ofboth.

[0046] The terminal 200 also includes a receive path, comprised of anetwork receiver 270, the audio codec 241 and the video codec 251. Thevideo codec 251 may include a display driver function, or that functionmay be implemented separately in a display driver 255, as illustrated.Likewise, the audio codec 240 may include a digital to analog (“D/A”)converter, or the D/A converter function may be provided externally, asin a D/A converter 245.

[0047] Referring to FIG. 1, the MCU 120 may communicate with the IPvideo conferencing terminals 190 over the network 180 or with PSTN videoconferencing terminals 170 over the PSTN 160. The MCU 120 may alsoinclude hardware and/or software implementing the H.323 standard (or theH.320 standard, where the MCU 120 is connected to the PSTN 160) and theT.120 standard, and may also include multipoint control for switchingand multiplexing video, audio, and data streams in a multimediaconference. The MCU 120 may additionally include hardware and/orsoftware to receive from, and transmit to, the PSTN video conferencingterminals 170 connected to gateway 130.

[0048] The MCU 120 may reside on one of the racks 110 (as shown inFIG. 1) or may be located elsewhere in the network, as are the MCU's 120a and 120 b. It will be appreciated that the MCU 120 may also reside inone of the PSTN video conferencing terminals 170, or one of the IP videoconferencing terminals 190, and may be implemented in hardware,software, or some combination thereof.

[0049] The rack 110 may provide additional services for use in a videoconference. These may include, for example, audio/video codecs that arenot within the H.323 or H.320 standards, such as the G2 codec andstreamer for use with a proprietary streaming system sold byReaINetworks, Inc., or a Windows Media codec for use with proprietarymedia systems sold by Microsoft Corporation. Other services may include,for example, a directory server, a conference scheduler, a databaseserver, an authentication server, and a billing/metering system.

[0050] Video codecs may include codecs for standards such as H.261 FCIF,H.263 QCIF, H.263 FCIF, H.261 QCIF, and H.263 SQCIF. videoteleconferencing standards define different image size and qualityparameters. Further, audio codecs may include codecs for standards suchas G. 711, G.722, G.722.1, and G.723.1. These audio teleconferencingstandards define audio data parameters for audio transmission. Any otherproprietary or non-proprietary standards currently known or that may bedeveloped in the future for audio, video, and data may likewise be usedwith the present invention, and are intended to be encompassed by thisdescription. For example, current H.320 devices typically employmonaural sound; however, the principles of the invention may be readilyadapted to a conferencing system employing stereo coding andreproduction, or any other spatial sound representation. Each and everystandard recited herein is hereby incorporated by reference in itsentirety, including any and all appendices, annexes, and subpartsthereof, as if it were set forth herein.

[0051] Delay Avoidance

[0052] Referring to FIG. 2, video conferencing delay is avoided byensuring that each received field is stored in a local video buffermemory (i.e., at the transmit video conferencing terminal or the receivevideo conferencing terminal, as appropriate) without loss of any fieldsdue to a mismatch between the initialized state of the video bufferpointer and the state of the first received video field. “State” in thecontext of this application refers to the association of both the videobuffer pointer and contents of a received video field with one of twotypes of fields (i.e., a top or a bottom field). A particular instanceof a buffer pointer identifying a buffer location at which to beginstoring the top field of a received video frame has a “top” state; theinstance of a pointer identifying a buffer for the bottom field has a“bottom” state. Likewise, the video data first received afterinitialization of the camera 210 and the encoder 250 (or the receiver270 and the decoder 251, at the receiving video conferencing terminal)is always the first line of either the top field or the bottom field, bycommon definition of the interlaced video standards. First receiveddatum in a given field (or the beginning of a field, generally) is thusreferred to herein as having either a “top” or “bottom” state,respectively.

[0053] At video conferencing system startup, both the video encoder 250and the video decoder 251 are initialized to receive either a top fieldor a bottom field of video frame data. As part of this initialization, adisplay buffer pointer is set to a particular memory location at eachvideo conferencing terminal (or “end” of the conference), corresponding,for example to the first line of the top field of video information. Asecond display buffer and its associated pointer are maintained by alocal processor for the bottom field. Alternatively, a second, separateprocessor can be employed to buffer alternating fields.

[0054] As field information is received by the video conferencing system(either from the local camera 210 or from a transmitting terminal), thedata is temporarily stored (i.e., buffered) in the local display buffer.During normal operations, the display buffer pointer is changed by theprocessor during a vertical blanking period of each frame to reset thepointer to a beginning of the buffer in preparation for the next field.For example, if the first field received is a top field, the displaybuffer pointer must be reset to the beginning of the bottom field bufferafter the top field has been displayed.

[0055] Regardless of the initial state of the display buffer pointer, ifthe first line of a new field arriving after initialization is not whatwas expected (i.e., does not match the field state of the bufferpointer), the present invention senses the state mismatch, anddynamically resets the buffer pointer to point to the correct buffer.Since the buffer pointer has only two possible states (i.e., pointing tothe top field or the bottom field), a dynamic reset can take the form ofa state toggle.

[0056] Referring to FIG. 3 an example of a display format for PALstandards 300 is shown. At video conferencing system startup, both thevideo encoder 250 (FIG. 2) and the video decoder 251 (FIG. 2) areinitialized to receive either a top field 310 or a bottom field 320 ofvideo frame data. As part of this initialization, a display bufferpointer is set to a particular memory location at each videoconferencing terminal corresponding to the first line of a field ofvideo information. A second display buffer and its associated pointercan be maintained by the local processor for the bottom field.Alternatively, a second, separate processor may be employed to bufferalternating fields.

[0057] The video processor senses the received field state when thevideo processor decodes the video and picture layer information. Inparticular, a PSUPP field in the picture layer of an H.263—complaintvideo signal contains, within the Picture Message (function type [FTYPE]14), an indication of whether the field is the top field 310 or thebottom field 320. The PSUPP field is, itself, fully described in sectionW.63 of Annex W to the H.263 standard, and is thus well-known to personsof ordinary skill in the art.

[0058] As field information is received by the video conferencingterminal 170 (FIG. 1) from a local camera or from a transmittingterminal, the data may be temporarily stored (buffered) in the localdisplay buffer by the terminal's video processor. During normaloperations, the display buffer pointer is changed by the processorduring the vertical blanking period of each frame to reset the pointerto the beginning of the buffer in preparation for the next field. Forexample, if the first field received is the top field 310, the displaybuffer should be reset to the beginning of the bottom field buffer afterthe top field 310 has been displayed.

[0059] Regardless of the initial state of the display buffer pointer, ifthe first line of a new field arriving after initialization (the“initial field”) does not match the field state of the buffer pointer,the video processor senses the state mismatch, and dynamically resetsthe buffer pointer to point to the correct buffer, examples of which areshown in FIGS. 4 and 5. Where, as in PAL or NTSC video, the bufferpointer has only two possible states (i.e., pointing to the top field orthe bottom field), this “dynamic reset” can take the form of a statetoggle.

[0060] The buffer pointer can be initialized to either a first state ora second state. The first state associated with the top field 310 andthe second state associated with the bottom field 320 of video framedata. Referring to FIG. 4, a schematic representation of an organizationof an exemplary embodiment of a video frame buffer 411, where thedisplay buffer pointer is initialized to the top field 310 (FIG. 3) andthe bottom field 320 (FIG. 3) is received first, is shown. In thisscenario, because the state of the initial field of video frame data isnot the same as the state of the first initialized buffer pointer, thestate of the first initialized buffer pointer is toggled 421, and thefirst received field is stored into a buffer using the first initializedbuffer pointer. In other words, the processor will immediately cause thedisplay buffer pointer to reposition the display lines such that avertical spatial relationship between top and bottom lines is preserved.In this embodiment, the toggling 421 can be a change of state of thefirst initialized buffer pointer or the replacement of the firstinitialized buffer pointer with a second initialized buffer pointerhaving a state different from the state of the first initialized bufferpointer.

[0061] Other embodiments not heavily dependent on buffer pointers andtheir adjustment exist. In such a case, the processor is initialized toreceive an initial field of video frame data having a first state, butthe processor receives an initial field of video frame data having asecond state. The display buffer is then adjusted by one display line,and the initial field of video frame data having a second state isstored into the display buffer. As shown in FIG. 4, the first state isthe top field 310, and the second state is the bottom field 320.Therefore, the display buffer is adjusted down one display line. At thismoment, the display buffer 411 is remapped for an additional position422. The toggling 421, in this embodiment, is the adjustment of thedisplay position of the fields by one line downward such that, althoughbottom field 320 lines go into top field 310 lines in the displaybuffer, the vertical spatial relationship between top field 310 andbottom field 320 lines is preserved.

[0062] Referring to FIG. 5, a schematic representation of anorganization of an exemplary embodiment of a video frame buffer 511,where the display buffer pointer is initialized to the bottom field 320(FIG. 3) and the top field 310 (FIG. 3.) is received first, is shown. Inthis embodiment, because the state of the initial field of video framedata is not the same as the state of the first initialized bufferpointer, the state of the first initialized buffer pointer is toggled521, and the first received field is stored into a buffer using thefirst initialized buffer pointer. In other words, the processor willimmediately cause the display buffer pointer to reposition the displaylines such that the vertical spatial relationship between top and bottomlines is preserved. In this embodiment, the toggling 521 can be a changeof state of the first initialized buffer pointer, or the replacement ofthe first initialized buffer pointer with a second initialized bufferpointer having a state different from the state of the first initializedbuffer pointer.

[0063] Other embodiments not heavily dependent on buffer pointers andtheir adjustment exist. In such a case, the processor is initialized toreceive an initial field of video frame data having a first state, butthe processor receives an initial field of video frame data having asecond state. As shown in FIG. 5, the first state is the bottom field320 and the second state is the top field 310. Therefore, the displaybuffer is adjusted up one display line. At this moment, the buffer isremapped to add an additional position 522. The toggling 521 in thisembodiment is the adjustment of the display position of the fields byone line upward such that, although top field 510 lines go into bottomfield 520 lines in the display buffer, the vertical spatial relationshipbetween top field 510 and bottom field 520 lines is preserved.

[0064] The method of the present invention may be performed in hardware,software, or any combination thereof, as those terms are currently knownin the art. In particular, the present method may be carried out bysoftware, firmware, or microcode operating on a computer or computers ofany type. Additionally, software embodying the present invention maycomprise computer instructions in any form (e.g., source code, objectcode, interpreted code, etc.) stored in any computer-readable medium(e.g., ROM, RAM, magnetic media, punched tape or card, compact disc (CD)in any form, DVD, etc.). Furthermore, such software may also be in theform of a computer data signal embodied in a carrier wave, such as thatfound within Web pages transferred among devices connected to theInternet. Accordingly, the present invention is not limited to anyparticular platform, unless specifically stated otherwise herein.

[0065] The above description is illustrative and not restrictive. Manyvariations of the invention will become apparent to those skilled in theart upon review of this disclosure. The scope of the invention shouldtherefore be determined not with reference to the above description, butinstead should be determined with reference to the appended claims alongwith their full scope of equivalents.

We claim:
 1. A method of buffering video frame data in a video displaysystem, comprising: initializing a processor to receive an initial fieldof video frame data having a first state; receiving an initial field ofvideo frame data having a second state; adjusting a display buffer byone display line; and storing the initial field of video frame datahaving the second state into the display buffer.
 2. The method of claim1, wherein the first state is a top field and the second state is abottom field.
 3. The method of claim 2, wherein the display buffer isadjusted down by one display line.
 4. The method of claim 1, wherein thefirst state is a bottom field and the second state is a top field. 5.The method of claim 4, wherein the display buffer is adjusted up by onedisplay line.
 6. The method of claim 1, further comprising sequentiallystoring subsequent received fields according to the state of each of thesubsequent received fields in the display buffer.
 7. The method of claim1, further comprising displaying the video frame data by reading thereceived fields from the buffer.
 8. A method of buffering video framedata in a video display system, comprising: initializing at least onebuffer pointer to either a first state or a second state to form a firstinitialized buffer pointer; receiving an initial field of video framedata having either the first state or the second state; if a state ofthe initial field of video frame data is not the same as a state of thefirst initialized buffer pointer, toggling the state of the firstinitialized buffer pointer; and storing the received initial field ofvideo frame data into a buffer using the first initialized bufferpointer.
 9. The method of claim 8, wherein the first state is associatedwith a top field of the video frame data and the second state isassociated with a bottom field of the video frame data.
 10. The methodof claim 8, wherein the toggling further comprises changing the state ofthe first initialized buffer pointer.
 11. The method of claim 8, whereinthe toggling further comprises replacing the first initialized bufferpointer with a second initialized buffer pointer having a statedifferent from the state of the first initialized buffer pointer. 12.The method of claim 8, further comprising sequentially storingsubsequent received fields according to the state of each of thesubsequent received field in the display buffer.
 13. The method of claim8, further comprising displaying the video frame data by reading thereceived fields from the display buffer.
 14. An electronically readablemedium having embodied thereon a program, the program being executableby a machine to perform method steps for use in buffering video framedata in a video display system, comprising: initializing a processor toreceive an initial field of video frame data having a first state;receiving an initial field of video frame data having a second state;adjusting a display buffer by one display line; and storing the initialfield of video frame data having the second state into the displaybuffer.
 15. An electronically readable medium having embodied thereon aprogram, the program being executable by a machine to perform methodsteps for use in buffering video frame data in a video display system,comprising: initializing at least one buffer pointer to either a firststate or a second state to form a first initialized buffer pointer;receiving an initial field of video frame data having either the firststate or the second state; if a state of the initial field of videoframe data is not the same as a state of the first initialized bufferpointer, toggling the state of the first initialized buffer pointer; andstoring the received initial field of video frame data into a bufferusing the first initialized buffer pointer.
 16. The program executableby machine of claim 15, wherein the toggling further comprises changingthe state of the first initialized buffer pointer.
 17. The programexecutable by machine of claim 15, wherein the toggling furthercomprises replacing the first initialized buffer pointer with a secondinitialized buffer pointer having the state different from the state ofthe first initialized buffer pointer.
 18. A video frame data bufferingsystem in a video display system, comprising: means for initializing aprocessor to receive an initial field of video frame data having a firststate; means for receiving an initial field of video frame data having asecond state; means for adjusting a display buffer by one display line;and means for storing the initial field of video frame data having thesecond state into the display buffer.
 19. A video frame data bufferingsystem in a video display system, comprising: means for initializing atleast one buffer pointer to either a first state or a second state toform a first initialized buffer pointer; means for receiving an initialfield of video frame data having either the first state or the secondstate; means for toggling the state of the first initialized bufferpointer if a state of the initial field of video frame data is not thesame as a state of the first initialized buffer pointer; means forstoring the received initial field of video frame data into a bufferusing the first initialized buffer pointer.
 20. A method of coding avideo signal comprising: Initializing a codec to receive a video signalof a first type into a buffer; Receiving a video signal of a secondtype; and Adjusting parameters of the codec relating to storage of thevideo signal in the buffer to receive the video signal of the secondtype.
 21. The method of claim 20 wherein, the video signal of a firsttype is a top field and the video signal of the second type is a bottomfield.
 22. The method of claim 20 wherein, the video signal of a firsttype is a bottom field and the video signal of the second type is a topfield.
 23. The method of claim 20 wherein the step of adjustingparameters of the codec comprises redirecting a pointer to a location inthe buffer to a location one line down in the buffer.
 24. The method ofclaim 20 wherein the step of adjusting parameters of the codec comprisesremapping the buffer to map a new line before the line where a pointerto a location in the buffer is set after the step of initializing thecodec, and storing the video signal of the second type in the new line.25. The method of claim 20 wherein the step of receiving a video signalof a second type comprises receiving a video signal from a camera. 26.The method of claim 20 wherein the step of receiving a video signal of asecond type comprises receiving a video signal from a remote videoconferencing device.
 27. A method of storing received video signalscontaining top fields and bottom fields, wherein the vertical spatialrelationship between top fields and bottom fields is preserved eventhough the first field received was not the expected field, comprising,initializing a codec, wherein the codec is set to store first field datain the first line of a frame buffer memory; Receiving second field dataat the codec; Adjusting the codec to receive the second field data,rather than the expected first field data; and Storing the second fielddata in the frame buffer.
 28. The method of claim 27 wherein, the firstfield data is top field data and the second field data is bottom fielddata.
 29. The method of claim 27 wherein, the first field data is bottomfield data and the second field data is top field data.
 30. The methodof claim 27 wherein the step of adjusting the codec comprisesredirecting a pointer to the first line of the frame buffer memory, topoint one line down in the frame buffer memory.
 31. The method of claim27 wherein the step of adjusting the codec comprises remapping the framebuffer memory to add a new line before the first line set to store firstfield data, and storing the second field data in the new line.
 32. Themethod of claim 27 wherein the step of receiving second field data atthe codec comprises receiving second field data from a camera.
 33. Themethod of claim 27 wherein the step of receiving second field data atthe codec comprises receiving second field data from a remote videoconferencing device.
 34. A video conferencing device comprising a codecthat once initialized to store first field data upon receipt, isadjustable to store second field data instead of first field data. 35.The device of claim 34 wherein the first field data is top field dataand the second field data is bottom field data.
 36. The device of claim34 wherein the first field data is bottom field data and the secondfield data is top field data.